Psychoacoustical evaluation of the pitch-synchronous overlap-and-add speech-waveform manipulation technique using single-formant stimuli.
نویسندگان
چکیده
This article presents two experiments dealing with a psychoacoustical evaluation of the pitch-synchronous overlap-and-add (PSOLA) technique. This technique has been developed for modification of duration and fundamental frequency of speech and is based on simple waveform manipulations. Both experiments were aimed at deriving the sensitivity of the auditory system to the basic distortions introduced by PSOLA. In experiment I, manipulation of fundamental frequency was applied to synthetic single-formant stimuli under minimal stimulus uncertainty, level roving, and formant-frequency roving. In experiment II, the influence of the positioning of the so-called "pitch markers" was studied. Depending on the formant and fundamental frequency, experimental data could be described reasonably well by either a spectral intensity-discrimination model or a temporal model based on detecting changes in modulation of the output of a single auditory filter. Generally, the results were in line with psychoacoustical theory on the auditory processing of resolved and unresolved harmonics.
منابع مشابه
Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals
Timeand pitch-scale modifications of speech signals find important applications in speech synthesis, playback systems, voice conversion, learning/hearing aids, etc.. There is a requirement for computationally efficient and real-time implementable algorithms. In this paper, we propose a high quality and computationally efficient timeand pitch-scaling methodology based on the glottal closure inst...
متن کاملNovel speech duration modifier for packet based communication system
In this paper, we propose a real-time method for duration modification of speech for packet based communication system. While there is rich literature available on duration modification, it fails to clearly address the issues in real-time implementation of the same. Most of the duration modification methods rely on accurate estimation of pitch marks, which is not feasible in a real-time scenari...
متن کاملStatistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language
Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...
متن کاملProsody Modification of Standard Arabic Speech Using Combining Synchronous Overlap and Add With Fixed-Synthesis Algorithm and Multi Level Discrete Wavelet Transform
Problem statement: The objective of prosody modification is to change the amplitude, duration and pitch (F0) of speech segments without altering their spectral envelop. Applications are numerous, including, Text-To-Speech synthesis, transformation of voice characteristics and foreign language learning. Several approaches have been developed in the literature to achieve this goal. The main restr...
متن کاملRule-based Emotion Synthesis Using Concatenated Speech
Concatenative speech synthesis is increasing in popularity, as it offers higher quality output than previous formant synthesisers. However, it is based on recorded speech units, concatenative synthesis offers a lesser degree of parametric control during resynthesis. Consequently, adding pragmatic effects such as different speaking styles and emotions at the synthesis stage is fundamentally more...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- The Journal of the Acoustical Society of America
دوره 101 4 شماره
صفحات -
تاریخ انتشار 1997